45 research outputs found

    THE USE OF ROUGH CLASSIFICATION AND TWO THRESHOLD TWO DIVISORS FOR DEDUPLICATION

    Get PDF
    The data deduplication technique efficiently reduces and removes redundant data in big data storage systems. The main issue is that the data deduplication requires expensive computational effort to remove duplicate data due to the vast size of big data. The paper attempts to reduce the time and computation required for data deduplication stages. The chunking and hashing stage often requires a lot of calculations and time. This paper initially proposes an efficient new method to exploit the parallel processing of deduplication systems with the best performance. The proposed system is designed to use multicore computing efficiently. First, The proposed method removes redundant data by making a rough classification for the input into several classes using the histogram similarity and k-mean algorithm. Next, a new method for calculating the divisor list for each class was introduced to improve the chunking method and increase the data deduplication ratio. Finally, the performance of the proposed method was evaluated using three datasets as test examples. The proposed method proves that data deduplication based on classes and a multicore processor is much faster than a single-core processor. Moreover, the experimental results showed that the proposed method significantly improved the performance of Two Threshold Two Divisors (TTTD) and Basic Sliding Window BSW algorithms

    Image Compression Using Tap 9/7 Wavelet Transform and Quadtree Coding Scheme

    Get PDF
    This paper is concerned with the design and implementation of an image compression method based on biorthogonal tap-9/7 discrete wavelet transform (DWT) and quadtree coding method. As a first step the color correlation is handled using YUV color representation instead of RGB. Then, the chromatic sub-bands are downsampled, and the data of each color band is transformed using wavelet transform. The produced wavelet sub-bands are quantized using hierarchal scalar quantization method. The detail quantized coefficient is coded using quadtree coding followed by Lempel-Ziv-Welch (LZW) encoding. While the approximation coefficients are coded using delta coding followed by LZW encoding. The test results indicated that the compression results are comparable to those gained by standard compression schemes

    An efficient method for stamps recognition using Haar wavelet sub-bands

    Get PDF
    The problem facing certain organizations such as insurance companies and government institutions where a huge amount of documents is handled every day, hence an automated stamp recognition system is required. The image of the stamp may be on a different background, with different sizes, and suffers from rotating in different directions, also, the appearance of soft areas (patches) or small points as noise. Thus, the main objective of this paper is to extract and recognize the color stamp image. This paper proposed a method to recognize stamps, by using a technique named Haar wavelet sub-bands. The devised method has four stages: 1) extracts the stamp image; 2) preprocessing the image; 3) feature extraction; and 4) matching. This paper is implemented using C sharp (Microsoft Visual Studio 2012) programming language. The experiments conducted on a stamp dataset showed that the proposed method has a great capability to recognize stamps when using Haar wavelet transform with two sets of features (i.e., 100% recognition rate for energy features and 99.93% recognition rate for low order moment)

    Color image compression based on spatial and magnitude signal decomposition

    Get PDF
    In this paper, a simple color image compression system has been proposed using image signal decomposition. Where, the RGB image color band is converted to the less correlated YUV color model and the pixel value (magnitude) in each band is decomposed into 2-values; most and least significant. According to the importance of the most significant value (MSV) that influenced by any simply modification happened, an adaptive lossless image compression system is proposed using bit plane (BP) slicing, delta pulse code modulation (Delta PCM), adaptive quadtree (QT) partitioning followed by an adaptive shift encoder. On the other hand, a lossy compression system is introduced to handle the least significant value (LSV), it is based on an adaptive, error bounded coding system, and it uses the DCT compression scheme. The performance of the developed compression system was analyzed and compared with those attained from the universal standard JPEG, and the results of applying the proposed system indicated its performance is comparable or better than that of the JPEG standards

    A Comparative Study for String Metrics and the Feasibility of Joining them as Combined Text Similarity Measures

    Get PDF
    This paper aims to introduce an optimized Damerau–Levenshtein and dice-coefficients using enumeration operations (ODADNEN) for providing fast string similarity measure with maintaining the results accuracy; searching to find specific words within a large text is a hard job which takes a lot of time and efforts. The string similarity measure plays a critical role in many searching problems. In this paper, different experiments were conducted to handle some spelling mistakes. An enhanced algorithm for string similarity assessment was proposed. This algorithm is a combined set of well-known algorithms with some improvements (e.g. the dice-coefficient was modified to deal with numbers instead of characters using certain conditions). These algorithms were adopted after conducting on a number of experimental tests to check its suitability. The ODADNN algorithm was tested using real data; its performance was compared with the original similarity measure. The results indicated that the most convincing measure is the proposed hybrid measure, which uses the Damerau–Levenshtein and dicedistance based on n-gram of each word to handle; also, it requires less processing time in comparison with the standard algorithms. Furthermore, it provides efficient results to assess the similarity between two words without the need to restrict the word length

    The Use of Quadtree Range Domain Partitioning with Fast Double Moment Descriptors to Enhance FIC of Colored Image

    Get PDF
    In this paper, an enhanced fractal image compression system (FIC) is proposed; it is based on using both symmetry prediction and blocks indexing to speed up the blocks matching process. The proposed FIC uses quad tree as variable range block partitioning mechanism. two criteria’s for guiding the partitioning decision are used: The first one uses sobel-based edge magnitude, whereas the second uses the contrast of block. A new set of moment descriptors are introduced, they differ from the previously used descriptors by their ability to emphasize the weights of different parts of each block. The effectiveness of all possible combinations of double moments descriptors has been investigated. Furthermore, a fast computation mechanism is introduced to compute the moments attended to improve the overall computation cost. the results of applied tests on the system for the cases “variable and fixed range” block partitioning mechanism indicated that the variable partitioning scheme can produce better results than fixed partitioning one (that is, 4 × 4 block) in term of compression ratio, faster than and PSNR does not significantly decreased

    Audio compression using transforms and high order entropy encoding

    Get PDF
    Digital audio is required to transmit large sizes of audio information through the most common communication systems; in turn this leads to more challenges in both storage and archieving. In this paper, an efficient audio compressive scheme is proposed, it depends on combined transform coding scheme; it is consist of i) bi-orthogonal (tab 9/7) wavelet transform to decompose the audio signal into low & multi high sub-bands, ii) then the produced sub-bands passed through DCT to de-correlate the signal, iii) the product of the combined transform stage is passed through progressive hierarchical quantization, then traditional run-length encoding (RLE), iv) and finally LZW coding to generate the output mate bitstream. The measures Peak signal-to-noise ratio (PSNR) and compression ratio (CR) were used to conduct a comparative analysis for the performance of the whole system. Many audio test samples were utilized to test the performance behavior; the used samples have various sizes and vary in features. The simulation results appear the efficiency of these combined transforms when using LZW within the domain of data compression. The compression results are encouraging and show a remarkable reduction in audio file size with good fidelity
    corecore